Predicting compressive strength of eco-friendly plastic sand paver blocks using gene expression and artificial intelligence programming

Plastic sand paver blocks provide a sustainable alternative by using plastic waste and reducing the need for cement. This innovative approach leads to a more sustainable construction sector by promoting environmental preservation. No model or Equation has been devised that can predict the compressive strength of these blocks. This study utilized gene expression programming (GEP) and multi-expression programming (MEP) to develop empirical models to forecast the compressive strength of plastic sand paver blocks (PSPB) comprised of plastic, sand, and fibre in an effort to advance the field. The database contains 135 results for compressive strength with seven input parameters. The R2 values of 0.87 for GEP and 0.91 for MEP for compressive strength reveal a relatively significant relationship between predicted and actual values. MEP outperformed GEP by displaying a higher R2 and lower values for statistical evaluations. In addition, a sensitivity analysis was conducted, which revealed that the sand grain size and percentage of fibres play an essential part in compressive strength. It was estimated that they contributed almost 50% of the total. The outcomes of this research have the potential to promote the reuse of PSPB in the building of green environments, hence boosting environmental protection and economic advantage.

Gene expression programming. To address the need for a different approach to fixed-length binary strings (used in genetic algorithms), Koza presented a GP technique 63 . The GP methodology defines five main parameters: the gathering of terminals, the set of primitive functions, the level of fitness evaluation, the control variables, and the conditions for termination accompanied by the outcomes classification method 63 . GP is a flexible programming method because it may be used to induce non-linear structures that resemble parse trees. It presupposes any non-linearity from the outset, given the data. Similar non-linearities have been employed in the past 63,64 . The inability to account for a person's unique genome is a major shortcoming of GP. The genotype and phenotype in GP have the same non-linear structure. This reduces the likelihood that naive or unsophisticated language may result. To address the shortcomings of the GP approach, Ferreira proposes the GEP method 63 . The fact that just the genome is passed down from one generation to the next is a major change throughout GEP. The formation of entities by an individual chromosome containing several genes is another notable feature 65  www.nature.com/scientificreports/ symbol is stabilized in genetic code operators. Data required to build an empirical model is written to chromosomes, and a new programme called karva is created to deduce their meaning. The steps involved in GEP are depicted in Fig. 2. Starting with randomly generated chromosomes of the same size for each individual, the approach then converts them into expression trees (ET) and calculates an estimate of fitness for every single individual. Replication with fresh new individuals continues for several creations until desirable outcomes are reached. Populations may be changed by employing genetic operations like crossover, reproduction, and mutation.
Multi expression programming. The MEP is a thorough, proven linear-based GP method that uses linear chromosomes to encode data. The working mechanism of the MEP is similar to that of the GEP. The ability to encrypt many software packages (solutions) onto a single chromosome 66 is a crucial part of MEP, which is a unique subset of the GP approach. Then, the best chromosome is selected by assessing fitness values to generate the final product. According to Oltean and Grosan 67 , a binary environment that splits into two offspring would inevitably choose two parents. The procedure is repeated until the optimal programme is found, at which point the criteria are stopped. This is the site where future generations begin to change. The MEP model, like the GEP model, allows for parameter fitting. The key variables that govern multi-expression programming are the number of code lengths, subpopulations, crossover probability, subpopulation size, and set of functions 68 . When the population size is the total number of programmes, the computation and time required to calculate are compounded as the number of subpopulations increases. In addition, the length of the code has a major impact on the size of the resulting mathematical expressions. Figure 3 shows the steps involved in the MEP technique.
Comparison of GEP and MEP. Historical data sets are generally utilized throughout the assessment and modeling phases for every of the aforementioned genetic programming approaches 69,70 . It is often believed that the GEP and MEP methods, in particular, are the most prominent linear GP methodologies that properly assess the compressive strength of the concrete composite. When compared to that of the MEP, the operating system of the GEP possesses a higher degree of complexity 68 . In contrast to the GEP, the noncoding parts of the MEP can be located wherever on the chromosome. Additionally, connections to function attributes are clearly documented in the MEP method 57,67 . Because of these changes, the MEP format is more suited for the reuse of code (despite the fact that it is less condensed than GEP). In addition, it is stated that it contains the head and tail of a typical GEP chromosome. The head and tail of a typical GEP chromosome both include symbols that successfully represent syntactically logical computer programmes, which is further evidence that the GEP is far more effective. As a direct consequence of this, more study is needed to evaluate the efficacy and the applicability of  www.nature.com/scientificreports/ both GP approaches to a particular engineering problem. GEP and MEP are both used to discover answers to optimization issues; however, whereas MEP focuses primarily on identifying a single equation that can be utilized to solve a problem, GEP is more focused on modeling and approximating data. GEP is employed to identify solutions to optimization problems.
The benefits of MEP are as follows 71,72 ; MEP use many expression sets rather than a single expression. As a result, the material strength prediction model may be broken down into its constituent parts, or phases, which can then be more easily understood and analysed. The computing time required to evolve the model and estimate concrete strength may be drastically decreased by evaluating the many expression sets used in MEP in parallel. This feature of parallel processing becomes very useful when dealing with enormous datasets. The concept of epistasis, which describes how different genes or expressions influence one another, is incorporated into MEP. As a result of epistasis's incorporation, MEP is able to account for the complex interplay between the different variables that affect the durability of concrete.
Criteria for assessing models. When evaluating a model's performance on a training or testing set, statistical errors such as mean absolute error root (MAE), root mean square error (RMSE), R-square value (R 2 ), and normalized root mean squared error (NRMSE), and were used. A model's predictive ability is quantified by its R 2 (also referred to as the determination coefficient) 73,74 . Improvements in artificial intelligence (AI) modeling approaches have allowed for more precise predictions of concrete's mechanical properties. In this research, the GEP and MEP models are statistically compared by the calculation of error criteria. There are a lot of measures that might help explain why the model is inaccurate. The coefficient of determination may be used to verify the reliability and validity of the model. Models with R 2 values that are more than 0.50 produce disappointing results, whereas models with R 2 values that fall within the range of 0.65 and 0.75 produce encouraging results. Equation (1) may be used to determine R 2 . Both the input and the output of MAE use the same units. It is possible for a model with an MAE within a certain range to make serious errors on occasion. In order to determine MAE, we use Eq. (2). The RMSE is the average squared error in estimations and measurements. Error squared is calculated by summing the error squares. This new approach pays greater weight to extreme cases than did earlier calculations, producing large squared differences in some cases but smaller ones in others. As the RMSE number drops, the model's ability to accurately forecast new data improves. The RMSE is computed using Eq. (3). The RMSE is helpful for comparing models of varying complexity. An alternative to the RMSE that accounts for the variable's observed spread is the NRMSE. So, the NRMSE can be thought of as a fraction of the total range that the model can usually resolve. Using Eq. (4), we can calculate the NRMSE. Recently, various analyst worked on different materials applications like civil engineering and sustainability [75][76][77] , prediction of mine water in flow and cement based materials [78][79][80] , structure engineering applications [81][82][83] , reinforced reservoir, thermal evolution of chemical structure and concrete beam [84][85][86] , fiber reinforced soil 87 , stress relaxation behavior 88 and embankment and foundation for ballast less high speed railway 89 .

Data collection
Our study relied on actual experimental testing that was performed in a laboratory facility. The PSPB was manufactured with a wide range of plastic-to-sand ratios, sand sizes, various fibre percentages, and fibre lengths. The data included in the models were derived from experiments done in the past 90,91 . The compressive strength has been calculated through laboratory testing of 135 samples. The materials involved in the developing of this PSPB were plastic, sand and fibers (basalt fibers and coconut fibers). Table 1 displays the input and output parameters considered in this analysis. The studies with the most promising outcomes are selected for further analysis. Seven input parameters (plastic, sand size, length of fiber, sand, percentage of fiber, diameter of fiber and tensile strength of fiber) were chosen from the literature, while all other variables were held constant, and modelling was performed on this data set. Similar approaches were reported in the prior literature, wherein the other factors, such as curing regimens, method of preparation, physical and mechanical properties of raw materials, and environmental condition, were held constant [92][93][94][95] . Figure 4 and Table 2 present the model's frequency distribution and generic data descriptions, respectively. Distribution plays a role in the effectiveness of any model 96 . It should be noted that multiple tests were performed to ascertain the database's validity and accuracy. The data with the highest error rates were disregarded, while those with the lowest error rates were chosen for the model prediction 97 . Models are tested and trained with the use of the GEP and MEP methods in this study. The models were trained on 80% of the data and then tested on the remaining 20%. The results of this testing provide a precise complement to those of previous experimental testing conducted on a variety of models. Since the research employs several models, the correctness of each model has previously been confirmed and evaluated using testing data. Genetic evolution was used to train the model, while testing data was used to verify the accuracy of the embedded model 39,98 .

Results and discussion K-fold cross-validation.
Researchers from a range of domains have hypothesized that the ratio of data indicates that the overall quantity of inputs has a significant role in the effectiveness of the suggested model 96,99 . For the best model models 99 , the proportion should be more than 5 so that data points may be tested for their ability to determine the link between the chosen variables. The present study predicts the compressive strength of the PSPB using seven inputs, and the resultant proportion of 19.2 meets the requirements set out by the researchers. The findings of k-fold cross-validation using GEP and MEP yielded insightful information on the effectiveness of these methods. Maximum R 2 values for GEP were 0.89, minimum R 2 values for GEP were 0.72, and the average R 2 value for GEP was 0.81. However, MEP demonstrated somewhat better performance, with an R 2 range of 0.92 to 0.75 and an average of 0.86. These results suggest that both GEP and MEP are adequately fitting the data, with MEP providing a slightly superior overall fit, as shown in  www.nature.com/scientificreports/ 0.076 to 0.059, with an average of 0.066. Figure 5 shows the result of the k-fold cross-validation for both GEP and MEP. In terms of prediction accuracy and variability with respect to the target variable, the NRMSE values that are closer to zero are indicative of higher performance. In conclusion, when tested using k-fold cross-validation, both GEP and MEP showed signs of being highly effective. Overall, MEP performed better than GEP, with lower MAE, RMSE, and NRMSE values and higher R 2 values. These results demonstrate the promise of both GEP and   Developing the PSPB empirical equation using GEP. GEP was used to develop an empirical equation for PSPB's compressive strength. By fusing genetic programming with a classic genetic algorithm, GEP creates a potent evolutionary algorithm. The objective was to formulate a formula that, given a number of input parameters, reliably predicted the compressive strength of the paver blocks. To begin, seven input variables were defined, each of which was selected for its potential impact on compressive strength. Then, the input variables and the five arithmetic operators were selected to form the terminal set. The GEP method resulted in the creation of expression trees (ETs), which were constructed using these terminal symbols as their basis and comprised of the five basic arithmetic operations, i.e., −, + , x, ÷ , and Ln. The PSPB GEP model's ETs are depicted in Fig. 6. The GEP method iterated and refined the expression trees to arrive at the best possible empirical Equation for the compressive strength over the course of several generations. Each ET was given a fitness score, with the best individuals being chosen, mutated, crossed across, and tested again until an excellent response was found.
Following the identification of the three sub-expression trees (sub-ETs), the final empirical Equation for the compressive strength of the plastic sand paver blocks (PSPB) was formed by combining the results of these equations. This Equation depicts the link between the input factors and the compressive strength, as shown in Eq. (5). It provides helpful insights as well as a prediction tool that can be used in the process of designing and producing paver blocks that are durable.
where,     Fig. 7. In order to determine the error between the experimental and predicted values, the error was analyzed as well. The compressive strength that was predicted was significantly different from the actual values by 2.28 MPa, which was the amount of inaccuracy that was observed to be the greatest.
On the other hand, the error that was recorded as being the lowest was 0.08 MPa, which indicates that the real compressive strength was approximated quite closely. An overall measure of the difference between the anticipated values and the actual values was found to be 0.98 MPa, which was determined to be the average value of the error. Figure 8 shows the error distribution of the actual and predicted dataset. Additional research was carried out in order to classify the mistakes according to the extent of their occurrence. It was found that 29.6% of the errors were less than 0.5 MPa, which indicates a good level of accuracy in forecasting compressive strength  www.nature.com/scientificreports/ within a restricted range. A reasonable degree of accuracy was indicated by the fact that 48.1% of the total errors were within the range of 0.5 MPa to 1.5 MPa. On the other hand, 22.3% of the total was made up of mistakes that were more than 1.5 MPa, which indicates that the prediction model needs additional improvements. These findings provide evidence that GEP is an excellent method for developing an empirical equation for determining the compressive strength of PSPB.
Developing the PSPB equation using MEP. In this part, multinomial expression models are developed in order to make a prediction about the compressive strength of PSPB based on seven different parameters that were input. In addition, Eq. (6) contains empirical equations obtained from ETs for the output of PSPB that was used to identify the compressive strength result. These empirical equations can be utilized to estimate the compressive strength result. In addition, the ETs are made up of the same five arithmetic operators as before, i.e., −, + , ×, ÷ , and Ln.   Figure 9 depicts the actual and predicted values of the MEP model. In order to determine the difference between the actual and predicted values, the errors were analyzed as well. The error distribution of the MEP model is shown in Fig. 10. The projected compressive strength was significantly different from the actual readings by a total of 2.09 MPa, which was the biggest inaccuracy that was recorded during the experiment. On the other hand, the error that was reported as being the smallest was 0.03 MPa, which indicates that the real compressive strength was approximated quite closely. An overall measure of the divergence between the projected and actual values was determined to be 1 MPa, which was found to be the average amount of inaccuracy that was detected. Additional research was carried out in order to classify the mistakes according to the extent of their occurrence. It was found that 22.2% of the errors were less than 0.5 MPa, which indicates a high degree of accuracy in forecasting compressive strength within a restricted range. This finding was made possible by the fact that the range of the data was narrow. Errors that fell between the range of 0.5 MPa to 1.5 MPa made up 59.3% of the total, which indicates that a sizeable number of accurate forecasts were within the moderate range.
On the other hand, errors bigger than 1.5 MPa accounted for 18.5% of the total, which indicates that the predictive model has less room for error variations than GEP. These findings provide evidence that Multi Expression Programming is a viable method for developing an empirical equation for determining the compressive strength www.nature.com/scientificreports/ of plastic sand paver blocks. The accuracy of the model appears to be promising, given that it has a high R 2 value and the bulk of its predictions are within error limits that are acceptable.
Sensitivity analysis. Sensitivity analysis is a useful method for evaluating the effect of varying input variables on the predicted outcome of a model. This technique is essential for comprehending the model's behavior and dependability 102 . To commence the sensitivity analysis, it is necessary to precisely define the issue while determining the input variables that influence the model's output. After identifying the variables, the next stage was to determine the range of possible values for each input variable. This range should include reasonable and significant values for the parameters under consideration. Sensitivity analysis allows us to assess the relative relevance and impact of each input variable on the model's output by examining various values within the defined ranges. This process aids in determining which variables have the greatest influence on the predictions and facilitates the making of well-informed decisions based on the behavior of the model. In the instance of PSPBs, a sensitivity analysis was carried out to determine the impact of a number of different elements on their performance regarding compressive strength. Recently, various analyst worked on different materials applications like civil engineering and sustainability [95][96][97] , prediction of mine water in flow and cement based materials 98-100 , structure engineering applications 75,101,102 , reinforced reservoir, thermal evolution of chemical structure and concrete beam [76][77][78] , fiber reinforced soil 79 , stress relaxation behavior 80 and embankment and foundation for ballast less high speed railway 81 . Equations (7) and (8) were used in the process of carrying out the sensitivity analysis.
where, f min (x i ) = forecasting model (minimum outcome), f max (x i ) = forecasting model (maximum outcome), i = representing the range of inputs while keeping all other factors fixed. The findings presented the percentage contribution that could be attributed to each element, so giving light to the relative importance of the variables. It was discovered that the size of the sand had the biggest contribution of around 29.57% among the components that were evaluated, demonstrating the enormous effect that it has on the performance of the blocks. It was discovered that the proportion of fibres that were included in the blocks had a significant influence, with a contribution that was around 21.98% of the total. Other parameters, such as fibre length (4.77%), fibre diameter (16.32%), and fibre tensile strength (6.87%), provided significant contributions to the compressive strength of the plastic sand paver blocks as well. These findings give useful insights for optimizing the manufacture and composition of plastic sand paver blocks, which are currently in use. Figure 11 shows that all of the variables have an important role in predicting PSPB's compressive strength.

Conclusion
No research has been done on the PSPB to generate the empirical Equation utilizing GEP and MEP methods.
To address this information gap and generate an accurate expression for anticipating the compressive strength of PSPB, the current work utilizes the GEP and MEP machine learning methodologies. The constructed models' generalizability was evaluated using extensive statistical, k-fold, and sensitivity analyses. The GEP and MEP models were compared using linear and non-linear regression expressions. The following are some of the particular findings of this study.
• The compressive strength R 2 values of 0.87 for GEP and 0.91 for MEP indicate a relatively strong correlation between predicted and actual values. In terms of R 2 , MEP outperformed GEP, indicating a superior fit to the data. • MEP developed a unique mathematical equation to predict compressive strength, indicating that it was more effective than GEP at capturing the underlying patterns and relationships in the data. • The statistical error measures (MAE, RMSE, and NRMSE) were lower for MEP (i.e., 0.983, 1.158, and 0.066) than they were for GEP (i.e., 1.007, 1.174, and 0.069), indicating greater precision in predicting compressive strength.
• The results of k-fold cross-validation consistently demonstrated that MEP outperformed GEP in terms of compressive strength prediction. This demonstrates the model's robustness and generalizability. • According to a sensitivity analysis, sand size and fibre percentage had roughly half the impact on compressive strength as the other five input parameters. This emphasizes the significance of regulating and optimizing these variables to increase PSPB's compressive strength.
The created models might be utilised to determine the compressive strength of PSPB for a variety of input parameter values, saving time and money on future trials. This study was limited to using seven fundamental variables (P, S, SS, FbL, Fb, FbD and FbT) for developing prediction models. However, other factors like curing regime, method of preparation, environment condition also impact the strength of a material. Therefore, further studies are required to generate a more comprehensive database including all possible influential parameters to develop models for strength evaluation of the materials.

Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.